Optical Character Recognition from Degraded Document Images

نویسنده

  • P.R.Nisha Beevi
چکیده

Segmentation of the text from badly degraded document images is very challenging tasks due to the high inter/intra variation between the document background and the foreground text of different types of document images. In this paper, a novel document image binarization technique is used to addresses the issues in the degraded document images by using adaptive image contrast. The adaptive image contrast is a combination of the local image contrast and the local image gradient that is tolerant to text and background variations caused by different types of document degradations. The adaptive contrast map is first constructed for an input degraded document image. Then the contrast map is then binarized and combined with the Canny’s edge map to identify the text stroke edge pixels. The document text is further segmented by a local threshold that is estimated based on the intensities of detected text stroke edge pixels within a local window. Then apply the vertical scanning to find how many lines in the binary document image. After then apply the horizontal scanning to find how many characters in the image. Then the character is recognized using discrete wavelet transform.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Degraded Document Analysis and Extraction of Original Text Document: An Approach without Optical Character Recognition

Document Image Analysis recognizes text and graphics in documents acquired as images. An approach without Optical Character Recognition (OCR) for degraded document image analysis has been adopted in this paper. The technique involves document imaging methods such as Image Fusing and Speeded Up Robust Features (SURF) Detection to identify and extract the degraded regions from a set of document i...

متن کامل

Extraction of Original Text Document from a Set of Degraded Text Documents from the Same Source

Information extraction is the task of extracting structured data from a degraded document. It includes data extraction such as text, image or graphics from the sources such as an image, video or documents. Text detection and extraction from the degraded document finds application in wide range of study. In this paper, an Optical Character Recognition less (OCR-less) method of obtaining an origi...

متن کامل

A Quad Tree Based Binarization Approach to Improve quality of Degraded Document Images

This paper proposes a novel binarization algorithm for converting the grayscale and color images into black and white images. The binarization is one of the very important process in all the researches pertaining to the field of the Document image processing and Pattern recognition. Since quality of binary image plays a critical role in the further processing of the document, especially in the ...

متن کامل

Binarization of Document Image

Documents Image Binarization is performed in the preprocessing stage for document analysis and it aims to segment the foreground text from the document background. A fast and accurate document image binarization technique is important for the ensuing document image processing tasks such as optical character recognition (OCR). Though document image binarization has been studied for many years, t...

متن کامل

Threshold Approach to Handwriting Extraction in Degraded Historical Document Images

Handwriting extraction is the skill of a system to get and translate comprehensible hand written input via sources such as document, photos, tough screen and other devices. The picture of the written document is used to detect written text by the use of optical scanning i.e. known as optical character recognition. Handwriting extraction basically uses optical character recognition. Conversely, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014